feat!: make voice receivers return AudioPacket classes instead of just Audio by petergeneric · Pull Request #11432 · discordjs/discord.js

petergeneric · 2026-02-27T23:11:13Z

Currently, RTP packet headers are stripped and clients are only delivered Opus frames as Buffers. The lack of timestamp makes jitter/drift hard to combat for clients, who must compute wall-clock delivery time.

This change introduces an AudioPacket interface which extends Buffer with the following read-only fields:

sequence (16-bit uint monotonically increasing counter to allow for identifying out-of-order RTP packets)
timestamp (32-bit uint that counts encoder-side timestamps; RFC 7587, Opus in RTP requires this be expressed at 48kHz no matter the audio sample rate)
ssrc (to allow consumers to detect a change and reset their Opus decoder state; updates on this value can already be received via SSRCMap events, however I think it also belongs on AudioPacket because there's a risk of delayed SSRCMap event delivery, and the client needs to know as soon as they receive a packet that changes it so they can reset their decoder state to correctly parse the packet)

This change deliberately does not attempt to provide a generalised parser for all the fields in the RTP Header to keep this PR small and focused just on the essential fields.

…nd SSRC. Refactored so that instead of pure Buffer, we now send AudioPacket (interface extending Buffer) which has readonly fields sequence, timestamp, and ssrc.

…e, timestamp, ssrc)

vercel · 2026-02-27T23:11:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
discord-js	Skipped		Mar 4, 2026 2:01pm
discord-js-guide	Skipped		Mar 4, 2026 2:01pm

coderabbitai · 2026-02-27T23:15:41Z

📝 Walkthrough

Walkthrough

The changes extend audio packet handling in the voice receiver to include RTP metadata (sequence number, timestamp, and SSRC). A new public AudioPacket interface is introduced to expose this metadata, and the VoiceReceiver now wraps decrypted packets with these fields before streaming. Tests validate metadata extraction and backward compatibility.

Changes

Cohort / File(s)	Summary
AudioPacket Interface Definition `packages/voice/src/receive/AudioReceiveStream.ts`	Introduces new public interface `AudioPacket` extending `Buffer` with readonly properties for RTP sequence (16-bit), timestamp (32-bit), and SSRC (32-bit) identifiers.
Packet Metadata Wrapping `packages/voice/src/receive/VoiceReceiver.ts`	Adds internal `createAudioPacket` helper function to attach RTP metadata as non-enumerable properties to decrypted buffers. Extracts sequence and timestamp from incoming UDP packet headers and wraps packets before streaming.
Metadata Extraction Tests `packages/voice/__tests__/VoiceReceiver.test.ts`	Adds two new tests validating RTP metadata extraction from desktop/mobile RTP packets and confirming backward compatibility with existing packet handling.

Sequence Diagram

sequenceDiagram
    actor UDP as UDP Source
    participant VR as VoiceReceiver
    participant CAP as createAudioPacket
    participant ARS as AudioReceiveStream
    
    UDP->>VR: onUdpMessage (encrypted packet)
    Note over VR: Extract RTP sequence,<br/>timestamp from bytes
    VR->>VR: Decrypt packet payload
    VR->>CAP: createAudioPacket(buffer,<br/>sequence, timestamp, ssrc)
    CAP->>CAP: Attach metadata as<br/>non-enumerable properties
    CAP-->>VR: AudioPacket (Buffer + metadata)
    VR->>ARS: stream.push(AudioPacket)
    Note over ARS: Consumer receives<br/>Buffer with metadata

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the motivation (RTP headers were previously stripped), the solution (AudioPacket interface with sequence, timestamp, ssrc fields), and the rationale for scope limitation.
Title check	✅ Passed	The title states 'make voice receivers return AudioPacket classes instead of just Audio', but the actual change introduces an AudioPacket interface (not a class) that wraps RTP header metadata into decrypted audio packets. The title is partially related but uses imprecise terminology ('classes' vs interface) and doesn't capture the core purpose: exposing RTP header fields (sequence, timestamp, ssrc).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/voice/src/receive/AudioReceiveStream.ts`:
- Around line 39-41: Update the documentation for the AudioPacket buffer to
state it contains an Opus-encoded payload (with RTP header metadata) rather than
a "decoded Opus packet"; locate the AudioPacket type/comment in
AudioReceiveStream (or the AudioPacket JSDoc) and change the wording to
explicitly say "a Buffer containing an Opus-encoded payload with RTP header
metadata" so API consumers aren't misled.

In `@packages/voice/src/receive/VoiceReceiver.ts`:
- Around line 177-180: The RTP header reads in VoiceReceiver (variables
sequence, timestamp, ssrc reading from msg) can throw for 9–11 byte buffers;
change the early length guard to require at least 12 bytes (e.g., if (msg.length
< 12) return;) or move these reads inside the existing try that wraps
parsePacket so any RangeError is caught; update the check or relocate the reads
in the VoiceReceiver.ts function that computes sequence/timestamp/ssrc to ensure
no unhandled RangeError occurs.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cb8be4 and f04d08b.

📒 Files selected for processing (3)

packages/voice/__tests__/VoiceReceiver.test.ts
packages/voice/src/receive/AudioReceiveStream.ts
packages/voice/src/receive/VoiceReceiver.ts

📜 Review details

🧰 Additional context used

🧬 Code graph analysis (2)

packages/voice/src/receive/VoiceReceiver.ts (1)

packages/voice/src/receive/AudioReceiveStream.ts (1)

AudioPacket (42-59)

packages/voice/__tests__/VoiceReceiver.test.ts (1)

packages/voice/__mocks__/rtp.ts (3)

RTP_PACKET_DESKTOP (7-16)

RTP_PACKET_CHROME (18-26)

RTP_PACKET_ANDROID (28-37)

🔇 Additional comments (2)

packages/voice/src/receive/VoiceReceiver.ts (1)

23-30: createAudioPacket wrapper is clean and backward-compatible.

Non-enumerable readonly metadata on Buffer is a good approach for preserving legacy buffer behavior.

packages/voice/__tests__/VoiceReceiver.test.ts (1)

71-110: Great coverage additions for metadata passthrough and compatibility.

These tests validate both RTP header field extraction and Buffer backward compatibility across multiple packet variants.

packages/voice/src/receive/AudioReceiveStream.ts